Search CORE

42 research outputs found

Learning Paraphrastic Sentence Embeddings from Back-Translated Bitext

Author: Gimpel Kevin
Mallinson Jonathan
Wieting John
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

We consider the problem of learning general-purpose, paraphrastic sentence embeddings in the setting of Wieting et al. (2016b). We use neural machine translation to generate sentential paraphrases via back-translation of bilingual sentence pairs. We evaluate the paraphrase pairs by their ability to serve as training data for learning paraphrastic sentence embeddings. We find that the data quality is stronger than prior work based on bitext and on par with manually-written English paraphrase pairs, with the advantage that our approach can scale up to generate large training sets for many languages and domains. We experiment with several language pairs and data sources, and develop a variety of data filtering techniques. In the process, we explore how neural machine translation output differs from human-written sentences, finding clear differences in length, the amount of repetition, and the use of rare words

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Zero-Shot Crosslingual Sentence Simplification

Author: Lapata Mirella
Mallinson Jonathan
Sennrich Rico
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

Sentence simplification aims to make sentences easier to read and understand. Recent approaches have shown promising results with encoder-decoder models trained on large amounts of parallel data which often only exists in English. We propose a zero-shot modeling framework which transfers simplification knowledge from English to another language (for which no parallel simplification corpus exists) while generalizing across languages and tasks. A shared transformer encoder constructs language-agnostic representations, with a combination of task-specific encoder layers added on top (e.g., for translation and simplification). Empirical results using both human and automatic metrics show that our approach produces better simplifications than unsupervised and pivot-based methods

Crossref

Edinburgh Research Explorer

ZORA

Universal rewriting via machine translation

Author: Mallinson Jonathan
Publication venue: The University of Edinburgh
Publication date: 30/11/2021
Field of study

Natural language allows for the same meaning (semantics) to be expressed in multiple different ways, i.e. paraphrasing. This thesis examines automatic approaches for paraphrasing, focusing on three paraphrasing subtasks: unconstrained paraphrasing where there are no constraints on the output, simplification, where the output must be simpler than the input, and text compression where the output must be shorter than the input. Whilst we can learn paraphrasing from supervised data, this data is sparse and expensive to create. This thesis is concerned with the use of transfer learning to improve paraphrasing when there is no supervised data. In particular, we address the following question: can transfer learning be used to overcome a lack of paraphrasing data? To answer this question we split it into three subquestions (1) No supervised data exists for a specific paraphrasing task; can bilingual data be used as a source of training data for paraphrasing? (2) Supervised paraphrasing data exists in one language but not in another; can bilingual data be used to transfer paraphrasing training data from one language to another? (3) Can the output of encoder-decoder paraphrasing models be controlled

Edinburgh Research Archive

Teaching Small Language Models to Reason

Author: Adamek Jakub
Magister Lucie Charlotte
Mallinson Jonathan
Malmi Eric
Severyn Aliaksei
Publication venue
Publication date: 01/06/2023
Field of study

Chain of thought prompting successfully improves the reasoning capabilities of large language models, achieving state of the art results on a range of datasets. However, these reasoning capabilities only appear to emerge in models with a size of over 100 billion parameters. In this paper, we explore the transfer of such reasoning capabilities to models with less than 100 billion parameters via knowledge distillation. Specifically, we finetune a student model on the chain of thought outputs generated by a larger teacher model. Our experiments show that the proposed method improves task performance across arithmetic, commonsense and symbolic reasoning datasets. For example, the accuracy of T5 XXL on GSM8K improves from 8.11% to 21.99% when finetuned on PaLM-540B generated chains of thought

arXiv.org e-Print Archive

Sentence Compression for Arbitrary Languages via Multilingual Pivoting

Author: Lapata Maria
Mallinson Jonathan
Sennrich Rico
Publication venue
Publication date: 01/01/2018
Field of study

Crossref

Edinburgh Research Explorer

Paraphrasing Revisited with Neural Machine Translation

Author: Lapata Maria
Mallinson Jonathan
Sennrich Rico
Publication venue
Publication date: 07/04/2017
Field of study

Edinburgh Research Explorer

Small Language Models Improve Giants by Rewriting Their Outputs

Author: Adamek Jakub
Bražinskas Arthur
Mallinson Jonathan
Malmi Eric
Severyn Aliaksei
Vernikos Giorgos
Publication venue
Publication date: 22/05/2023
Field of study

Large language models (LLMs) have demonstrated impressive few-shot learning capabilities, but they often underperform compared to fine-tuned models on challenging tasks. Furthermore, their large size and restricted access only through APIs make task-specific fine-tuning impractical. Moreover, LLMs are sensitive to different aspects of prompts (e.g., the selection and order of demonstrations) and can thus require time-consuming prompt engineering. In this light, we propose a method to correct LLM outputs without relying on their weights. First, we generate a pool of candidates by few-shot prompting an LLM. Second, we refine the LLM-generated outputs using a smaller model, the LM-corrector (LMCor), which is trained to rank, combine and rewrite the candidates to produce the final target output. Our experiments demonstrate that even a small LMCor model (250M) substantially improves the few-shot performance of LLMs (62B) across diverse tasks. Moreover, we illustrate that the LMCor exhibits robustness against different prompts, thereby minimizing the need for extensive prompt engineering. Finally, we showcase that the LMCor can be seamlessly integrated with different LLMs at inference time, serving as a plug-and-play module to improve their performance

arXiv.org e-Print Archive

Opsoclonus-Myoclonus Presenting With Features of Spasmus Nutans

Author: Chrousos GA
Iqbal N. Allarakhia
Jonathan D. Trobe
Katzman B.
Koenig SB
Mallinson AI
Publication venue: 'SAGE Publications'
Publication date: 01/01/1995
Field of study

Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/66540/2/10.1177_088307389501000117.pd

Crossref

Deep Blue Documents at the University of Michigan